It Begins…

This whole thing started after I read an excellent blog post by Julia Silge. You should definitely check it out, but the short version is that she used language processing tools within R to analyze the sentiment in Jane Austen’s novels. This led to the natural question: could I do this with rap albums? I didn’t learn R coding just to write a dissertation, right?!

In the same way that Julia looked at Austen’s entire body of work, I thought it would be interesting to look at one musical artist’s whole commercial output. Within hip hop, Kanye seems like a natural choice. He’s got seven albums out, which I’d argue encompass a greater musical diversity than probably any other recent, popular musician. Some of his later albums, particulalry 808s & Heartbreak and Yeezus have been polarizing, even for fans of his early work. So his catalog seems ripe for a sentiment analysis. Would the sentiment analysis indicate major differences in feeling across his career arc? Is there really an “old Kanye” to miss, lyrically speaking?

Anyways, what follows is my best shot at this analysis. Major credit to Julia Silge here. My work followed pretty easily from the code she had already written. And since we’re dealing with hip hop here, let’s not say I stole her code. I sampled it. I remixed it. If you don’t care anything about how I actually did the analysis in R, you can skip The Geeky Stuff section and head right along to The Rap Stuff.

 

The Geeky Stuff

In thinking about the general problem, it seemed to me there were three major steps to the analysis:

  1. Get the lyrics for the albums that I want to analyze. Julia already had this solved for her work because she had the text of Austen’s novels packaged up nice and neat. I was starting from scratch.

  2. Perform the sentiment analysis. Julia already solved this problem for me.

  3. Visualize the results. Julia mostly solved this problem for me.

Here’s how I went about addressing these tasks:

  1. In thinking about how to get lyrics I wanted, the answer seemed pretty obvious to me: the website Genius (formerly Rap Genius). If you haven’t checked out Genius, you should. It’s essentially a wiki-type site dedicated to song lyrics. Contributors both transcribe the lyrics and annotate them for meaning. For my purposes, I just needed to get the lyrics out of the webpages and into R, preferably for whole albums at once. To do this, I used the handy R package rvest. It was my first time using it, but it allowed me to scrape lyrics off the web quickly and easily. I wrote a few R functions that allowed me to simply input the Genius page of the album I was interested in and pretty quickly process the entire album’s worth of lyrics.

  2. For the sentiment analysis, I just followed Julia’s lead. Major disclaimer: I know very little about sentiment analysis. My basic understanding is that it’s used to quantify the subject feelings that are present in a written text. Julia has a nice comparison of metrics that might be used to compute sentiment scores, and she settled on the bing method from the package syuzhet since it did not seem to be biased or overly variable. Good enough for her, good enough for me.

  3. Again, I followed Julia’s example here. I thought her visuals using ggplot2 looked really nice, so I just modified her code slightly. Whereas Julia was analyzing an entire novel’s worth of text at once, my texts had clear dividing points since an album is made up of discrete tracks. Thus, I added some visual elements to the plots to help distinguish between tracks within an album.

All these tasks can be done with surprisingly few R packages. Here’s what I used:

library(dplyr)
library(rvest)
library(stringr)
library(syuzhet)
library(ggplot2)
library(png)

To summarize, I’m using dplyr for general data manipulation, rvest for web scraping, stringr for manipulation of lyrics once I get them into R, syuzhet to conduct the sentiment analysis, and ggplot2 and png to create the plots.

 

The Rap Stuff

Ok, let’s get on with it already. But first, one revelation: I lied about this being just about Kanye. I wanted to “ground truth” the method with some other popular hip hop albums (and just because I thought it’d be fun). The first album that came to mind was Kendrick Lamar’s To Pimp A Butterfly, so that’s what I did first. For just this first plot, I’ll show you the code. Here’s the entire analysis for this album:

 

# To Pimp A Butterfly Analysis


tpab.htmls <- get_track_htmls(
  "http://genius.com/albums/Kendrick-lamar/To-pimp-a-butterfly", 
  "Kendrick")
tpab.htmls <- tpab.htmls[2:17]


tpab.album <- process_album(tpab.htmls)
tpab.chunked <- chunk_text(tpab.album)
tpab.tracksplits <- get_track_splits(tpab.chunked, tpab.htmls)


tpab.sentiment <- process_sentiment(tpab.album, "bing")


tpab.anno <- data.frame(x = find_middle(tpab.tracksplits, tpab.chunked), 
                        y = rep(c(11, 9), 20)[1:length(tpab.tracksplits)], 
                        label = c("Wesley's Theory", "For Free?",
                                  "King Kunta", "Institutionalized",
                                  "These Walls", "u", "Alright",
                                  "For Sale?", "Momma",
                                  "Hood Politics", "How Much...",
                                  "Complexion...",
                                  "The Blacker The Berry",
                                  "You Ain't Gotta Lie...",
                                  "i", "Mortal Man"))
tpab.image <- readPNG("images/tpab.png")
p.tpab <- 
  plot_sentiment(tpab.sentiment, tpab.anno, tpab.tracksplits, tpab.image)
p.tpab + 
  labs(title = 
         expression(paste("Sentiment in ", italic("To Pimp A Butterfly"))))

 

So you see that the resulting plot is very similar to what Julia produced in her analyses, except I’ve used vertical dotted lines to delineate tracks within an album. I also plotted positive sentiments in green and negative sentiments in red just to make the difference more obvious. I think this is also a good time to mention that I really have no idea how well this bing sentiment method may or may not be doing in analyzing a hip hop album. The Genius community certainly does a very thorough job of annotating albums, but I would imagine that much of the content of these albums is quite unusual for such an analysis. To give you an idea, here’s one of the chunks of To Pimp A Butterfly:

## [1] "I know everything, I know cars, clothes, hoes, and money I know loyalty, I know respect, I know those that's ornery I know everything, the highs, the lows, the groupies, the junkies I know if I'm generous at heart, I don't need recognition The way I'm rewarded, well, that's God's decision I know you know that line's for Compton School District Just give it to the kids, don’t gossip about how it was distributed I know how people work I know the price of life, I'm knowin' how much it’s worth I know what I know and I know it well not to ever forget"

If you’ve heard any of the albums I’ll be analyzing, you know it gets a lot more colorful than that. So I’d say that while these methods are capable of giving us answers, I’d take it all with a grain of salt.

I don’t think there are ton of surprises here.

Since I did a Kendrick album, I felt obligated to give Drake some attention too. The President may have given Kendrick the nod, but what do the numbers look like? People could argue about this (and people I know will probably argue with me about this), but I think Take Care is probably his best album to this point, so let’s plot that one:

 

 

 

 

The Kanye Canon

Ok, we’re finally moving into the Kanye stuff. But first let’s just take a moment to remember the pure, luxurious soul this man gave us. You could have your first wedding dance to that. In fact, you probably should. Let’s also not forget when he helped Bob Simon understand a “dope ass beat” (also known as a “really good track”).

 

 

 

“Bittersweet Poetry” is bringing down the whole sentiment of this album, but it also gave us this interaction, so let’s call it an even trade.

 

 

 

 

 

We can also summarize sentiment across albums:

##                         Album Title Sentiment Mean Sentiment Variance
## 1               The College Dropout           0.16               5.06
## 2                 Late Registration           0.25               3.76
## 3                        Graduation           0.10               4.34
## 4                 808s & Heartbreak          -0.31               2.41
## 5 My Beautiful Dark Twisted Fantasy          -1.16               6.63
## 6                            Yeezus          -0.79               3.74
## 7                 The Life of Pablo          -0.25               5.31

 

Today’s Material

Easy writing makes hard reading.
- Ernest Hemingway

Writing is failure. Over and over and over again.
- Ta-Nehisi Coates

 

Hope You Enjoyed!